Estimating speech from lip dynamics

نویسندگان

  • Jithin Donny George
  • Ronan Keane
  • Conor Zellmer
چکیده

The goal of this project is to develop a limited lip reading algorithm for a subset of the English language. We consider a scenario in which no audio information is available. The raw video is processed and the position of the lips in each frame is extracted. We then prepare the lip data for processing and classify the lips into visemes and phonemes. Hidden Markov Models are used to predict the words the speaker is saying based on the sequences of classified phonemes and visemes. The GRID audiovisual sentence corpus [10][11] database is used for our study.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

لب‌خوانی و ادراک گفتار دانش‌آموزان کم‌شنوای مدارس ویژۀ کم‌شنوایان در شهر تهران

Objective: The goal of this study was to evaluate the lip reading ability and Speech perception of hearing impaired students of special schools for the hearing impaired in different speech levels. Materials & Methods: In this cross- sectional study, 44 deaf students (9-12 years old) were selected with multi-stage cluster sampling method, from two special schools for the deaf in Tehran. Tools...

متن کامل

Visual analysis of lip coarticulation in VCV utterances

This paper presents an investigation of the visual variation on the bilabial plosive consonant /p/ in three coarticulation contexts. The aim is to provide detailed ensemble analysis to assist coarticulation modelling in visual speech synthesis. The underlying dynamics of labeled visual speech units, represented as lip shape, from symmetric VCV utterances, is investigated. Variation in lip dynam...

متن کامل

Improving speaker identification performance in reverberant conditions using lip information

This paper considers the improvment of speaker identification performance in reverberant conditions using additional lip information. Automatic speaker identification (ASI) using speech characteristics alone can be highly successful, however problems occur with mis-matches between training and testing conditions. In particular, we find that ASI performance drops dramatically when given anechoic...

متن کامل

Visual analysis of viseme dynamics

Face to face dialogue is the most natural mode of communication between humans. The combination of human visual perception of expression and perception in changes in intonation provides semantic information that communicates idea, feelings and concepts. The realistic modelling of speech movements, through automatic facial animation, and maintaining audio-visual coherence is still a challenge in...

متن کامل

Speech intelligibility after repair of cleft lip and palate

    Background: Intelligibility refers to understandability of speech; and lack of it can negatively affect children’s overall communication effectiveness. Children with repaired cleft lip and/or cleft palate (CL/P) may experience poor speech intelligibility. This study aimed at evaluating speech intelligibility in children with repaired CL/P who had not been referred to sp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1708.01198  شماره 

صفحات  -

تاریخ انتشار 2017